17 research outputs found

    On the evolution of microbes: the evolution of genomes with respect to RNA folding

    Get PDF
    We hypothesized that the stringency by which RNA folds (summarized in our analysis by the predicted folding free energy (FFE)) may be under selective pressure, presumably due to its role in (reverse) transcription and translation, and its potential effect on the RNA degradation rate. For bacteria, the RNA folding will depend on the physical properties of their environment. For viruses, this balance needs to be reached for every host the virus is successfully replicated in, and may play a critical role in adapting to new hosts. In the influenza A virus, we have shown that the FFE of its polymerase genes is evolving through time from lower to higher values, every time an avian segment jumps into humans. We postulated that this may be related to the difference in body temperature between humans and birds, as generally the genes isolated from avian sources have significantly lower FFE than the human isolates. Furthermore, we can use the FFE and amino acid sequence of the influenza A virus, to classify whether a given virus is similar to others that can jump to and successfully infect human hosts. In bacteria, we have shown that, consistent with previous studies of GC content, tRNA FFE is linearly correlated with growth temperature; while mRNA FFE is not. Regardless, we showed that the growth conditions are related to mRNA FFE distributions and function. Furthermore, there is a relation between mRNA FFE and half-life. Finally, we showed that gene expression can be predicted from RNA structure and sequence properties. In studying RNA folding in both viruses and bacteria, we were able to view the possible association between FFE and environment in two ways: the number of bacterial genomes sequenced allows us to get a sense of what RNA structures and folding energies are required for the bacteria to inhabit a wide variety of environments- everything from the human body to colonizing black smokers on the ocean floor; while the number of influenza A genomes sequenced allows us to determine how the RNA structures change over time. By using both sets of information, we can get a clearer picture of both the importance of RNA structure, and how RNA structure and folding energy evolve as the host environment changes

    Data Leakage in Notebooks: Static Detection and Better Processes

    Full text link
    Data science pipelines to train and evaluate models with machine learning may contain bugs just like any other code. Leakage between training and test data can lead to overestimating the model's accuracy during offline evaluations, possibly leading to deployment of low-quality models in production. Such leakage can happen easily by mistake or by following poor practices, but may be tedious and challenging to detect manually. We develop a static analysis approach to detect common forms of data leakage in data science code. Our evaluation shows that our analysis accurately detects data leakage and that such leakage is pervasive among over 100,000 analyzed public notebooks. We discuss how our static analysis approach can help both practitioners and educators, and how leakage prevention can be designed into the development process

    Regulatory Circuit of Human MicroRNA Biogenesis

    Get PDF
    miRNAs (microRNAs) are a class of endogenous small RNAs that are thought to negatively regulate protein production. Aberrant expression of many miRNAs is linked to cancer and other diseases. Little is known about the factors that regulate the expression of miRNAs. We have identified numerous regulatory elements upstream of miRNA genes that are likely to be essential to the transcriptional and posttranscriptional regulation of miRNAs. Newly identified regulatory motifs occur frequently and in multiple copies upstream of miRNAs. The motifs are highly enriched in G and C nucleotides, in comparison with the nucleotide composition of miRNA upstream sequences. Although the motifs were predicted using sequences that are upstream of miRNAs, we find that 99% of the top-predicted motifs preferentially occur within the first 500 nucleotides upstream of the transcription start sites of protein-coding genes; the observed preference in location underscores the validity and importance of the motifs identified in this study. Our study also raises the possibility that a considerable number of well-characterized, disease-associated transcription factors (TFs) of protein-coding genes contribute to the abnormal miRNA expression in diseases such as cancer. Further analysis of predicted miRNA–protein interactions lead us to hypothesize that TFs that include c-Myb, NF-Y, Sp-1, MTF-1, and AP-2α are master-regulators of miRNA expression. Our predictions are a solid starting point for the systematic elucidation of the causative basis for aberrant expression patterns of disease-related (e.g., cancer) miRNAs. Thus, we point out that focused studies of the TFs that regulate miRNAs will be paramount in developing cures for miRNA-related diseases. The identification of the miRNA regulatory motifs was facilitated by a new computational method, K-Factor. K-Factor predicts regulatory motifs in a set of functionally related sequences, without relying on evolutionary conservation

    The role of RNA folding free energy in the evolution of the polymerase genes of the influenza A virus

    Get PDF
    RNA folding free energy is important for the evolution and host-adaptation of the influenza virus. Human virus polymerase genes are shown to have substantially higher folding free energy values than their avian counterparts

    Large Scale Comparison of Innate Responses to Viral and Bacterial Pathogens in Mouse and Macaque

    Get PDF
    Viral and bacterial infections of the lower respiratory tract are major causes of morbidity and mortality worldwide. Alveolar macrophages line the alveolar spaces and are the first cells of the immune system to respond to invading pathogens. To determine the similarities and differences between the responses of mice and macaques to invading pathogens we profiled alveolar macrophages from these species following infection with two viral (PR8 and Fuj/02 influenza A) and two bacterial (Mycobacterium tuberculosis and Francisella tularensis Schu S4) pathogens. Cells were collected at 6 time points following each infection and expression profiles were compared across and between species. Our analyses identified a core set of genes, activated in both species and across all pathogens that were predominantly part of the interferon response pathway. In addition, we identified similarities across species in the way innate immune cells respond to lethal versus non-lethal pathogens. On the other hand we also found several species and pathogen specific response patterns. These results provide new insights into mechanisms by which the innate immune system responds to, and interacts with, invading pathogens

    K-means clustering according to beta diversity.

    No full text
    <p>K-means clustering of NEC and non-NEC samples, according to beta diversity. Individual data points represent individual samples. The ovals around the points represent 2 standard deviations of the data, and the circle in the middle of each oval represents the center of the cluster. The <i>Enterobacteriaceae</i> cluster is shown in red, the <i>Clostridium</i> cluster in blue, and <i>Bacteroides</i> cluster in green. The single yellow point represents sample 9B, and the magenta points form a heterogenous cluster.</p

    Clinical characteristics of infants in this study.

    No full text
    <p>Matched samples are indicated by the letters in the label; for example, samples 11A, 11B, and 11C were derived from the same patient. Unknown data is indicated by the abbreviation UNK; NA, not applicable; DOL, day of life.</p><p>Abbreviations for tissue types: J, jejunum; I, ileum; C, cecum; RC, right colon; TC, transverse colon; LC, left colon.</p><p>Abbreviations for mode of delivery are CS, Cesarean section; V, vaginal delivery.</p><p>Feeding refers to enteral nutrition received in the week prior to sample collection, and the abbreviations for feeding are: BM, breast milk; FF, formula feed; C, combination breast and formula; NPO, no feeds.</p><p>Perinatal antibiotics refers to the initial course of antibiotics consecutively administered to the baby after delivery. Abbreviations for antibiotics: A, ampicillin; G, gentamycin; V, vancomycin, Cl, clindamycin; Cf, cefotaxime.</p><p>Preoperative antibiotic days refer to the number of consecutive days on which antibiotics were administered immediately preceding sample collection. This excludes antibiotics administered at time of surgery.</p><p>Clinical characteristics of infants in this study.</p

    Microbial burden and population diversity.

    No full text
    <p>Sample by sample measurements of (A) relative concentration of 16S gene sequences as determined by qPCR and (B) Shannon diversity indices. Shapes indicate heavy antibiotic exposure (diamonds) immediately prior to sample collection, low antibiotic exposure (squares), or no antibiotics (triangles).</p
    corecore